10 research outputs found

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used

    Revisiting the conclusion instability issue in software effort estimation

    Get PDF
    Conclusion instability is the absence of observing the same effect under varying experimental conditions. Deep Neural Network (DNN) and ElasticNet software effort estimation (SEE) models were applied to two SEE datasets with the view of resolving the conclusion instability issue and assessing the suitability of ElasticNet as a viable SEE benchmark model. Results were mixed as both model types attain conclusion stability for the Kitchenham dataset whilst conclusion instability existed in the Desharnais dataset. ElasticNet was outperformed by DNN and as such it is not recommended to be used as a SEE benchmark model

    Evaluation of IT Service Desk: A Case Study

    Get PDF
    Organisations rely heavily on Information Technology Services Management (ITSM) to provide efficient and quality services to all stakeholders. This research is an exploratory study conducted of the service desk operations model. The research explores simple metrics and a weighted requirement matrix for evaluating and selecting ITSM systems. Several data gathering tools which include brainstorming, interviewing, participantobservation and collaborative feedback document have been employed in this research for collecting requirements from stakeholders to ensure viability and robustness of the research. Prominent challenges to sound implementation of a suitable service desk suite have been identified and tabulated. The identified challenges, coupled with feedback from stakeholders enabled the researchers to arrive at a scaled section framework for selecting an ITSM system. A comparison of eleven state of the art service desk systems has also been completed as part of the research. This research also proposes a novel service desk process with specific emphasis on the roles played by various stakeholders in provision of an efficient service desk operation

    Using Bisect K-Means Clustering Technique in the Analysis of Arabic Documents

    No full text
    In this article, I have investigated the performance of the bisect K-means clustering algorithm compared to the standard K-means algorithm in the analysis of Arabic documents. The experiments included five commonly used similarity and distance functions (Pearson correlation coefficient, cosine, Jaccard coefficient, Euclidean distance, and averaged Kullback-Leibler divergence) and three leading stemmers. Using the purity measure, the bisect K-means clearly outperformed the standard K-means in all settings with varying margins. For the bisect K-means, the best purity reached 0.927 when using the Pearson correlation coefficient function, while for the standard K-means, the best purity reached 0.884 when using the Jaccard coefficient function. Removing stop words significantly improved the results of the bisect K-means but produced minor improvements in the results of the standard K-means. Stemming provided additional minor improvement in all settings except the combination of the averaged Kullback-Leibler divergence function and the root-based stemmer, where the purity was deteriorated by more than 10%. These experiments were conducted using a dataset with nine categories, each of which contains 300 documents

    Remote sensing dataset for detecting cows from high resolution aerial images

    No full text
    Details: Total number of images: 655. Total number of cows: 29803. Images: Aerial RGB images with spatial resolution of 0.1m. 500 x 500 pixels corresponding to 50m x 50m. Label: point annotation of each visible cattle in the image. Image Sources: Land Information New Zealand 2016-2017 https://zenodo.org/record/590886

    Automatic counting of chickens in confined area using the LCFCN algorithm

    No full text
    Grouping chickens based on their weights is an important process that takes place in many chicken farms in New Zealand where chickens are grouped into three categories: small, medium and large. Each category has pins (cages) to temporarily hold the chickens during the process and a permeant bigger section to hold the chickens after grouping. Chickens are weighed and placed in respective pins. Thereafter they are released to the permanent section. Currently, the chickens are counted manually when they are released from a pin to a bigger section. The task of weighing chickens, placing them in a pin and releasing them to a bigger section is repeated until all chickens are moved to their respective bigger section and the total number of chickens in each section is calculated. This manual effort is done by several employees and takes several hours. This study investigated the feasibility of using deep learning algorithms to replace the manual counting. We applied the localized fully convolutional network (LCFCN) algorithm to count and locate chickens from images of the pins. LCFCN was applied to a dataset of 4092 images containing 114132 chickens. The algorithm was evaluated using the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) metrics and achieved the values of 0.5592, 1.36% and 1.67 respectively which are promising results in this setting. Furthermore, we modified the implementation of LCFCN to enable a user to manually alter the predicted labels to guarantee error free counting and localization

    Assessing learning outcomes of course descriptors containing object-oriented programming concepts

    No full text
    This study follows well-published educational criteria for assessing the quality of learning outcomes and investigates how these criteria are applied to descriptors of courses that include object oriented programming concepts. These quality criteria aim to ensure that learning outcomes are specific, measurable, attainable, relevant and time-scaled. The study examined course descriptors from all universities in New Zealand and found a significant gap between the published criteria and the learning outcomes—it is clear that learning outcomes are widely open to interpretation and do not meet the criteria. There is a minimal level of detail provided in these outcomes that are insufficient to satisfy the stated criteria. This study then presents a new and more detailed implementation of outcomes augmented with assessment structure and marking criteria to show that adding more detail significantly increases the complexity of course descriptors. This highlights the need for robust discussions between the writers of course descriptors who look for simplicity and flexibility and quality assessors who expect precision and specifics
    corecore